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ABSTRACT 

A general method for testing nonlinearity in time series is described and applied to 
measurements of different pressure data inside the draft tube surge of a real Francis turbine. 
Comparing the current original time series to an ensemble of surrogates time series, suitably 
constructed to mimic the linear properties of the original one, we was able to distinguish a 
linear stochastic from a nonlinear deterministic behaviour and, moreover, to quantify the 
degree of nonlinearity present in the related dynamics. The problem of detecting nonlinear 
structure in real data is quite complicated by the influence of various contaminations, like 
broadband noise and/or long coherence times. These difficulties have been overcame using 
the combination of a suitable nonlinear filtering lechn\c\ue and a qualitative redundar7cy statistic 
analysis. The above investigations allow a quantitative characterization of different dynamical 
regimes of motion of gas cavities inside real turbines and, moreover, allow to support the 
reliability of some related mathematical modelizations. 
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1. INTRODUCTION 



The analysis of physical quantities from experimental measurements, performed on real 
dynamical systems, often assume hypotheses that can be confirmed only a posteriori, following 
the interpretative stage. This is a common situation when we move in a strictly linear 
framework, like the classical Fourier analysis. In fact, irregular and broadband signals, generic 
in nonlinear dynamical systems, have been discarded in the past as unwelcome "noise" or 
random information. The characterization of these complex signals and the extraction of 
physically interesting and useful features, is the main goal of the new methods for the analysis 
of real time series. Much of the current interest in nonlinear signal analysis arises mainly from 
the recognition that some complex nonlinear deterministic motions, called cfiaos for simplicity, 
play a significant role in physical systems. This fact has led to completely new techniques for 
time series analysis and consequently new ways to understand and to appreciate important, 
i.e. physically significant, aspects of experimental data. Nonlinearity or nonlinear determinism 
is a necessary condition for chaotic behaviour and, more generally, is a remarkable feature of 
every real process. 

The problem of deducing the dynamics of a given physical system from measured data 
is a well known challenge for experimental analysts. The new concepts and techniques 
involved in time series analysis are in the framework of nonlinear dynamics and theory of 
deterministic chaos. In the next section we illustrate some remarks about the analysis of time 
series with a particular emphasis on the detection of the underlying nonlinearity, and the 
problem of his reliable quantitative evaluation. Furthermore, we will describe a practical 
application of these concepts and techniques, on different time series generated from 
pressure measurements inside the draft tube surge of the real Francis turbine installed in the 
MOVE l-lydroelectric Power Plant of ENEL SpA (ITALY). The identification and quantification 
of the related underlying nonlinear deterministic/chaotic dynamics, allows a deeper insight of 
the features of this real physical system and, moreover, it gives useful information about some 
possible mathematical models suggested in previous works [1],[2]. 



2. TIME SERIES ANALYSIS: DETECTING NONLINEARITY 

Since the time of Poincare it has been recognized that even simple, but nonlinear, 
physical systems can produce very complicated dynamical behaviours. This work mainly deals 
with the problem of detecting nonlinearity in real time series; therefore it is convenient to spent 
some words about the concept of nonlinearity associated to time series. 

Let x(t) a given time series of N values taken at regular intervals of time: 
t=tg,t^=tcj+dt,...,ti^.^=tg+(N-1)dt, where dt is the constant sampling interval between successive 
measurements. Qualitatively, if the relation between two generic successive values, x(t) and 
x(t+dt), is expressed by a nonlinear function, we can say that the time series is nonlinear. A 
more rigorous definition, related with the practical methods for detection of nonlinearity, 
consists in the proof of an inconsistency with a linear stochastic process. This approach, 
common in statistical analysis, is a proper test of data against a null hypothesis, here a linear 
stochastic model [3]. The main goal of the analysis, is to test whether all the dependencies in 
a given time series can be explained on the linear level and moreover evaluate, if any, all the 
quantitative differencies between the original data under study and a proper set of realizations 
of a linear stochastic process with the same linear properties. If we detect some significant 
inconsistency with the assumed null hypothesis, we will consider the time series to be 
nonlinear We note that the detection of nonlinearity is a first step in a search for chaotic 
behaviour, and for that reason many authors, working on applied nonlinear dynamics, have 
recently proposed different approaches to this problem [4], [5]. 

A statistical test requires a well defined mathematical procedures for testing a given null 



hypothesis (here a linear stochastic process) by evaluating some statistical quantity, the value 
of which discriminates the confidence interval to reject it. Theiler et al., in 1 992, suggest a way 
to a quantitative statistic using the concept of a "surrogate data" [6]. The basic idea is to 
artificially generate a set of time series which mimic some of the features of the original data 
except the properties which we are testing for. In the case of testing for nonlinearity the 
surrogate data should have the same Fourier spectrum and autocorrelation function, i.e. the 
same linear properties as the original time series under study. For that reason these 
surrogates are generated as realizations of a linear stochastic process. The method starts 
applying a discrete Fourier transform operator,^, to the original time series: 

This complex Fourier transform we rewrite as: 

where A(oi) and (i>((x)) are the amplitude and the phase respectively. To produce a realization 
of a linear stochastic process, we randomize the phases at each frequency sampled by an 
independent random variable, p((i)), uniformly distributed in the interval [0,2n): 

(2) 

In this way the original Fourier spectrum, i.e. the absolute values of the original Fourier 
coefficients, are invariant quantities. The generic surrogate time series is finally given by the 
application of an inverse Fourier transform to obtain: 

(3) 

By construction, the artificial time series (3) is a particular realization of a linear stochastic 
process with the same spectrum and, by the Weiner-Khintchine theorem, with the same 
autocorrelation function as the original data [5]. In our case of testing for nonlinearity, we need 
a proper ensemble of surrogate time series (3) to perform a reliable statistic through some 
quantity, Q, able to discriminating nonlinearity. In principle, any nonlinear statistic can be used, 
as documented in literature [8], [9]. In this work we used the Takens best estimator of 
correlation dimension [7]: 

Q = Dr.= - (4) 

^dr 



where is an upper cut-off, and C(r) is the correlation integral derived from the Grassberger- 
Procaccia algorithm, [10], in the extended version of Theiler [11]: 



w-1 w-i -j (5) 
(A/+1-a)(/V-a),tr to 

where y is a vector in a proper reconstructed embedding space of dimension m that preserves 
all the invariant characteristics of sets in the original phase space [12], [13]. The embedding 
vectors in (5) are generated starting from a scalar quantity, x, using the delay time technique: 

y{t) = WO, xit^x) x(?+(a77-1)t))^ 



where x is the delay-time, i.e. the time interval between successive elements, and it is chosen 



such that the vector contains a maximum amount of information, without the components 
becoming completely uncorrelated. The discriminating statistic, eq.(4), is then computed for 
each of the surrogate time series and for the original one. If the numerical value obtained for 
the original data is significantly different from those obtained for the ensemble of surrogate 
data, then the null hypothesis of a linear stochastic process can be rejected and the original 
data classified as nonlinear. As a measure of significance, i.e. how significantly different the 
original data is from the ensemble of surrogates, we use the number of sigmas (or standard 
deviations) defined by: 

^ ^ \Q-<Q,ur^\ (7) 

where Q is the discriminating statistic computed for the original data, <Qsurr' and a^^,, are 
respectively the mean value and the standard deviation, computed for the surrogates. 

Formally, the above method provides a measure of statistical confidence, in term of 
probability, that the null hypothesis is false [14]. More precisely, the null hypothesis (here a 
linear stochastic process) is rejected, and thus the result is considered significant, if the 
res/dua/ probability, p, of the above hypothesis is lower than a chosen critical level. In this work 
we assume a statistic with a t-distribution, for the limited number, N^, of the surrogate 
realizations (typically ^^=10), giving a residual probability: p<0.01, for values of the statistic 
greather than 2.821 [15]. In principle, the Fourier-transform based surrogates method works 
very well but, in practice, there are many potential pitfalls and problems that can lead to false 
detection of nonlinearity [16]. Unlike analytical data, real signals are indeed contaminated with 
many external perturbations, such as noise, and also are measured with finite precision and 
recorded in finite sets. The situation is further complicated when the data exhibit long 
coherence times. Following Theiler et al., [16], we formalize the concept of coherence time in 
terms of the time x such that the absolute value of the autocorrelation function is smaller than 
some pre-specified value e for all t>x. When a given time series exhibit long coherence times, 
the FT-based surrogates method, in spite of the theoretical expectation, does not correctly 
mimic all the linear properties of the original time series, and thus it can leads to uncorrect or 
false nonlinear statistic. In order to overcome this unwelcome feature, Palus, [15], suggests 
to test differences between the original data and the related surrogates on both the linear and 
nonlinear levels to guarantee a reliable anonlinear statistic. In this work, to avoid spurious 
results due to linear deviations in the ensemble of surrogate data, we initially performed a 
linear redundancy analysis, both on the original data and on the surrogates ensemble. 
Following Palus, we define the linear redundancy of an arbitrary n-dimensional random 
variable: x„...,x„, by the formula: 

1 " 1 " (8) 

/=i /=i 

where Cn and a, are the diagonal elements (variances) and the eigenvalues, respectively, of 
the covariance matrix, C, describing the mutual linear dependencies of the variables [15]. 
Through the computation of linear redundancies (8), we was able to select the opf/ma/ generic 
realization of the original data that guarantees the linear compatibility with the related 
ensemble of surrogates, and thus the reliability of results from the nonlinear statistic analysis. 

Practical applications of the above nonlinear statistic analysis, often need reliable 
estimations of mathematical quantities, like correlation dimensions, that are strongly affected 
by the presence of broadband noise [11], [13]. The aim is the maximum reduction of the 
contaminating noise sources, without a significant distorsion of the original signal. In this work 
we further enhance the effectiveness of the filtering procedure successfully used in a previous 
study, [13], extending from the linear to the nonlinear framework, the class of filters proposed 
by Schreiber and Grassberger (1991) [17]. In this case the recurrent transformation for the 
filtered values, x^*, is expressed in the embedding space: 



As in the linear case, tine optimal values of the coefficients, a/"', b<"', are obtained by a least 
squares procedure restricted in a neighborhood of the embedding vector to be corrected 
[13],[17]. 

The whole procedure for detecting nonlinearity in real data adopted in the present work 
may be summarized in the following: 

- Construction of an experimental time series from N measurements of a selected 
representative scalar quantity (e.g. absolute pressure data); 

- Partitioning of the original data in a set of k sub-series of a given lenght n (=2'>512): nk<N; 

- For each sub-series, ii=1,2,...,k, we generate s (-10) surrogates and we check, using a linear 
redundancy analysis, if the linear characteristics of the original data match those of the 
surrogates ensemble; 

- For each current collection: original sub-series-i-related surrogates, we compute the Takens 
best estimator of correlation dimension, O, over a range of different embedding dimensions: 

m,,m2,...,m3 (here ~9): Q'°'(ii„mJ, 0^^(if^,m.),...,C^'>(i^,m); i^=1,2,...,k, m~m^,...,m;, 

- Estimation of the significance of nonlinear statistic to measure how significantly the original 
data differ from the related surrogates ensemble (see (7)): S(ii„mj)=f(0'^^ (i^,m^)); a=0,1,...,s, 
ii=1,...k, m=m^,...,m^; 

- Evaluation of the mean value, computed from the k original subseries, of the sigmas, S, for 
each embedding dimension: Sfm,;j=<S('4,m,j>^^^, m=m^,...,m^; 

- Evaluation of the nonlinear degree for the original series computed, for example, through the 
mean value from the embedding dimensions: 7^;=<S('mjj>^„,^=)U. 

Of course, if there are evidences of stochastic behaviour due to the presence of contaminating 
broadband noise in the original data, we apply a proper filtering procedure to clean the data. 
Moreover, the above analysis may be completed by an estimation of the principal invariant 
dynamical quantities, like Lyapunov exponents, Kolmogorov entropy, etc. for a better 
characterization of the observed process. 



3. APPLICATION TO REAL DATA: EXPERIMENTAL MEASUREMENTS 

The above described method has been applied to different time series generated by 
absolute pressure measurements taken inside the A/Ol/E hydroelectric power plant of ENEL 
SpA (the Italian National Board of Electricity) [13]. This plant is equipped with a vertical Francis 
turbine for a maximum power output of 72 MW, with a maximum flow rate of 80 m^/s. The 
runner size is 2.75 m, and its rotational speed is 230.8 rpm (3.84 Hz); whereas the draft tube 
is 21 .6 m long. Figure 1 shows the selected location of the fluid (water) pressure transducers. 
In particular, a first set of measurements is related to the point labelled 7 in the above figure, 
just below the runner (sample PR180); whereas a second set of measurements is related to 
the point labelled 9, inside the divergent (sample PDB). The dynamic regime conditions of 
turbine, during the measurements, correspond to an intermediate electric power rate of 30 MW, 
where anomalous dynamic behaviours have been detected. Previous analyses, [13], showed 
a clear evidence of a deterministic chaotic behaviour for irregular oscillations of gas cavities 



inside the Francis turbine. The aim of the present work is to complete the cited analysis 
quantifying the degree of nonlinearity of the related dynamics. Moreover, to shed more light 
on these complex phenomena, we tried to compare the results of the nonlinear analysis with 
another set of new measurements, performed after some mechanical modifications have been 
included in order to reduce the principal dynamic instabilities of the turbine at partial loads. 
These modifications consist in a conic-cylindrical appendix constrained to the final edge of the 
turbine shaft (suffix OFF), and an air flow inlet located in specific positions on the turbine shaft 
(suffix ON) (see figure 2). The original analogical electric signals, recorded on high speed 
magnetic tapes by ENEL SpA (DPT-SMP - Venice), have been converted and recorded in a 
suitable digital format through a special device available at CISE laboratories. 



4. APPLICATION TO REAL DATA: ANALYSIS OF RESULTS 

Nonlinearity analysis of time series: PR180 (test series), already explored in a previous 
work [13], performed on different filtered data (the original series, indeed, showed a typical 
random, somewhat correlated, behaviour due to a distributed noise), shows an increase of the 
coherence time due to the filtering procedure. This feature is well pointed out by the linear 
redundancy analysis, and it forces to a proper choice of the original sub-series lenght to avoid 
spurious detection of nonlinearity (figures 3,4,5). The nonlinear statistic, eq.(4), on the 
unfiltered data, clearly shows that the underlying dynamics is compatible with a linear 
stochastic process, with a strong evidence of a noise contamination [13], [16]. Moreover, the 
significance (7) sigmas suggests, in this case, a very low mean value of the nonlinearity 
degree: |i=0.88±0.62, where 95% is the assumed confidence interval (figures 6,7). A 
comparison with the residual probability threshold value (p<0.01), suggests the impossibility 
to reject the null hypothesis of a linear stochastic process. On the other hand, the application 
of a filtering procedure to the above real data changes both qualitatively and quantitatively the 
features of the nonlinear statistic. Now the sigmas are well beyond the critical threshold value, 
clearly showing the presence of a nonlinear deterministic behaviour: |a.=24.33+8.94 (figures 
8,9). This first result supports some previous indications according to which the observed 
stochastic behaviour of the original unfiltered series is only apparent, mainly due to a 
broadband noise contamination. 

Starting from the above result we sistematically applied the nonlinearity analysis to all 
the new time series, in the presence of the above cited mechanical modifications. Here 
sampling time was: dt=0.02 s (50 Hz) spanning a time interval of 43 seconds (2150 points). 
From the time evolution of the four signals (PR1 80 OFF, PDB OFF; PR1 80 ON, PDB ON), we 
note a more or less complex feature of data with a broadband frequency domain. A qualitative 
inspection of the original signals, afterwards supported by a quantitative Fourier analysis, 
suggests an increase of the dynamical complexity degree in the case of the air flow inlet 
configuration (suffix ON) (figures 10,1 1). This feature is also confirmed by the autocorrelation 
analysis, showing a decrease of the autocorrelation time, [18], or equivalently the average 
cycle time, [19], for the air flow inlet case (see table 1). As summarized in table 2, all the 
original unfiltered time series exhibit a behaviour compatible with a linear stochastic process. 
The linear redundancy analysis shows a good linear preserving property of the surrogates for 
sub-series lenghts of 512 points, indicating low coherence times. Moreover, the nonlinearity 
statistic analysis clearly shows no significant differences between original and surrogates data 
(figures 12,13). 

After the application of the Schreiber-Grassberger nonlinear filter, eq.(9), the increased 
coherence times, evidenced by the linear redundancy analysis, forced us to extend the generic 
sub-series minimum lenght to 1024 points. For the filtered time series the nonlinear statistic 
shows a typical feature of nonlinear deterministic systems (i.e. a finite value saturation for the 
Takens best estimator of correlation dimension), with significant differences between original 
and surrogates data (figures 1 4,1 5). As shown in table 2, the significance sigmas are now well 



beyond the critical threshold value, allowing the safe rejection of the null hypothesis of a linear 
stochastic model. We note that in the case of series with the suffix ON (air flow inlet), the 
numerical estimation of the nonlinear degree is quite reliable; whereas in the case of the suffix 
OFF there is an high indetermination of the mean values inside the assumed 95% confidence 
interval. However, as a general indication, the above results suggest a decrease of the 
nonlinearity degree for an air flow inlet configuration, with possible increase of the dynamical 
complexity due to the presence of more significant stochastic components. 

To complete our analysis we computed also some important invariant quantities, like 
the Kolmogorov Entropy, the Lyapunov spectrum, etc., and the related results are summarized 
in table 3 (see [13]). This information allows to shed more light on the dynamical complexity 
degree of the signals, and to point out its relation with the nonlinearity degree. In fact, the main 
result emerging from these computations is that the air flow inlet involves an increase of the 
dynamical instability, when the dynamics is initially a combination of nonlinear deterministic 
and linear stochastic contributions; whereas the air flow inlet reduces the dynamical 
complexity, when the dynamics is initially governed by only nonlinear deterministic components 
(figure 16). 



5. CONCLUSIONS 

We have presented some experimental applications of the method for testing 
nonlinearity in time series, utilizing linear redundancy and FT-based surrogates nonlinear 
statistic analyses. This work, combined with previous analyses, strongly support the suggested 
mathematical picture on the dynamical behaviour of gas cavities inside the draft tube of 
Francis turbines [1],[2],[13]. The basic idea is that the adequate and effective mathematical 
description of such physical systems must belong to the framework of nonlinear deterministic 
differential equation models, with a low dimensionality, rather than to a linear stochastic, with 
infinite degrees of freedom, ones. We hope that the results coming from these new approches 
could give some help to the advancement in the understanding of the complex nature of these 
real phenomena. 
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TABLE 1 

L«.nd.- S=ries=Coded Name of Experimemal Data; Dt=Original Time Sampling (s); Tac = Autocorrelation Time; Cycle Time = Average Cycle Time; 
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Series; F=Nonlincar Filtering Process of data. 
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L«enda- Series=Coded Name of Eiperimenlal Data; Red. Opt. Scr. Unglh=Optimal Ungth of Subseries from Linearized Redundancy Analysis; Taltens 
B^Est.= Dimension Estimation from Best Takens Estimator; Nonlinear Degree (Sigmas)= Nonlinearity Degree from Surrogate Ser.es Analysis. 



Series 


Limit 
Cut-Off 
Woo 


Max. Cut-Off 
Used 
Win 


Corr. Dim. 


Erjtb. Dim. 


Max. Lyap. 
Exponent 


Lyap. Dim. 


Kolm. 
Eiuropy 


PR ISO OFF 


13 


3 












PR180 ON 
















PDB OFF 


3 














PDB ON 
































PR 180 OFF F 


52 


13 


3.1 


5 


2.23 


2.9 


2.23 


PRI80 ON F 


44 


10 


6.1 


7 


22.0 


6.58 


38.4 


PDB OFF F 


10 


3 


4.3 


7 


10.9 


6.53 


20.2 


PDB ON F 


23 


5 


4.6 


7 


4.0 


3.5 


4.0 



TABLE 3 

Lesenda- Series=Coded Name of Experimental Data; Corr. Dim. = Correlation Dimension Estimation from Grassberger-Procaccia Me 
Dira.=Lyapunov Dimension Estimation from Kaplan- Yorkc Congccture; Kolm. Entropy=Sum of the positive Lyapunov Exponents (Pesm). 
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Figure 1 6 Nonlinearity/Compiexity Model 



